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Abstract 

A recently designed hyperspectral imaging device enables multiplexed acquisition of an 
entire data volume in a single snapshot thanks to monolithically-integrated spectral filters. 
Such an agile imaging technique comes at the cost of a reduced spatial resolution and the 
need for a demosaicing procedure on its interleaved data. In this work, we address both issues 
and propose an approach inspired by recent developments in compressed sensing and analysis 
sparse models. We formulate our superresolution and demosaicing task as a 3-D generalized 
inpainting problem. Interestingly, the target spatial resolution can be adjusted for mitigating 
the compression level of our sensing. The reconstruction procedure uses a fast greedy method 
called Pseudo-inverse IHT. We also show on simulations that a random arrangement of the 
spectral filters on the sensor is preferable to regular mosaic layout as it improves the quality 
of the reconstruction. The efficiency of our technique is demonstrated through numerical 
experiments on both synthetic and real data as acquired by the snapshot imager. 

Keywords: Hyperspectral, inpainting, iterative hard thresholding, sparse models, CMOS, 
Fabry-Perot. 


1 Introduction 

Hyperspectral Imaging (HSI) refers to the acquisition and processing of multichannel digital images 
of the spectrometry of every pixel of a scene, z.e., the intensity of light as a function of its 
wavelength. The knowledge of the spectrum can serve to identify the materials that are present in 
a particular pixel [T]. Many scientific applications of this technique exist in, e.g.^ remote sensing [2] 
and food science [3] with an increasing number of industrial and general purpose applications 
emerging due to the rapid simplification undergone by recent HSI systems [4]. 

Hyperspectral images, z.e., data volumes (or cubes) ordered in two spatial dimensions and one 
spectral, may contain several millions of voxels, partitioned in anywhere from a dozen up to hun¬ 
dreds of wavelengths. Traditional HSI systems require either line scanning, e.g., with a motorized 
translation stage, or spectral scanning with tunable filters [5]. Most often, this means a slow 
acquisition and a cumbersome device only suitable for imaging in controlled environments. More 
recently, snapshot HSI was introduced as a trade-off between spatial resolution and acquisition 
time; this imaging mode is obtained by multiplexing optically the 3-D spatial-spectral informa¬ 
tion on a 2-D CMOS photosensor array, over which a layer of Faby-Perot (FP) spectral filters is 
deposited laiHi- The main limitation of this approach is that the spatial resolution of the sensor 
array actually becomes limited by the number of acquired wavelengths. 

Compressed Sensing (CS) counts among the means used to cope with this resolution limit. 
When the acquisition process is appropriately designed, this mathematical framework leverages 
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the natural structure of a signal (e.^., its sparsity) to robustly reconstruct it from only a few non- 
adaptive measurements [9]. Compared to standard 2-D compressive imaging, HSI also offers an 
extra spectral dimension to exploit. CS is thus clearly an appealing objective for HSI, as noted by 
some compressive HSI designs unmi], some of which tackled explicitly an imaging architecture 
with a FP-filtered sensor array m- Nevertheless, these designs often require complex optical 
elements (e.^., spatial light modulation, beam splitter) that increases the global system complexity. 

The contributions of this work are multiple. First, our study is based on the snapshot imager 
from [6] and processes the resulting low-resolution data volume with an inpainting method to fill 
the missing information. The aim is to find a compromise between an instantaneous acquisition 
without scanning and a reasonably high target resolution with the aid of a recovery algorithm. 
Second, on the hardware side, we propose to slightly modify a simple snapshot imager design by 
randomly intermixing the FP filters on the sensor; this avoids aliasing effects due to subsampling 
on a regular grid. Simulations confirm that a random spatial arrangement of those spectral filters 
on the sensor is preferable to a regular mosaic layout as it improves HSI quality. Third, on 
the algorithmic and theoretical side, we generalize the classical inpainting problem by adding a 
linear interpolation step (upscaling) in the forward model enabled by an optical spatial lowpass 
filtering [7] common in demosaicing methods. Given a fixed number of observations, this linear 
operation allows to choose the target resolution and in particular, to reduce the subsampling factor 
by any desired value, making it easier for the inpainting method to fill the missing information. Our 
method is akin to CS in the sense that the formulations and goals are identical. The difference 
resides in the acquisition operation: where CS entails random linear mixtures as compressive 
measurements, our HSI device simply applies a direct subsampling. This work is also related to 
super-resolution as it allows to increase the number of pixels from the low multiplexed resolution 
of the focal plane. The proposed inpainting task relies on a greedy algorithm, the Pseudo-inverse 
IHT [20], slightly adapted to our needs for optimizing its convergence. Finally, we demonstrate 
through numerical simulations the potential of our design concept and reconstruction method for 
practical applications. 

The paper is structured as follows. The second section presents the acquisition device architec¬ 
ture and formulates the generalized inpainting problem. The third section presents the redundant 
dictionary used to promote a low-dimensional signal model, followed by a brief analysis of the PIHT 
algorithm. The fourth section presents the results of numerical simulations on both synthetic and 
real data. 

2 Hyperspectral Image Inpainting 


Ai6 

All 



7 

8 

6 

5 

15 

16 

14 

13 

11 

12 

10 

9 

3 

4 

2 

1 


Figure 1: Left: Illustration representing a 4x4 CMOS macropixel integrated with 16 Fabry-Perot filters (not all 
displayed). Right: True wavelength index selection in one such macropixel of the HSI mosaic imager [^. 


This work proposes a HSI sensor design on what could be called “direct domain-compressed 
sensing”, z.e., random subsampling. This architecture is associated to a snapshot spectral sensor 
that monolithically integrates a set of Fabry-Perot interferometers (FPI) [13] organized as pixel- 
level filter mosaics on top of a standard, off-the-shelf CMOS sensor, the focal plane array (FPA). 
Each FPI is actually made of a transparent layer sandwiched between two mirrors, the layer width 
determining the selected wavelength (See Fig. [^. 
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Let X G be a hyperspectral (HS) volume of a target scene (L is the number of bands) 

and Y G M^^xmj represent the recording at the focal plane array (FPA) with the same spatial 
resolution as X {Nj = m/, Nj = mj). The pixels of Y are a selection {e.g.^ random) of a group 
of voxels of the whole volume X] more precisely, each pixel yij of Y corresponds to a voxel 
of X with the same spatial indices (i,j) and a specific wavelength index A G [L] := {1, • • • ,1/}, 
e.g.^ selected by a FP filter. 

One emphasis of this paper is to assume the existence of an optical filtering of the spatial 
frequencies of the volume X that ensures that each spectral band is spatially band-limited {e.g., 
with birefringent filtering ED- As will become clearer hereafter, this ensures that each spectral 
band is spatially smooth or band-limited. This practice is common for stabilizing RGB demosaic- 
ing [g. This can be done for instance by slightly defocusing the objective lens of the imager under 
the assumption that the observed scene is well contained in the optical depth-of-field (DOF). In 
this context, we can target the reconstruction of a smaller spatial resolution and aim to recon¬ 
struct an undersampled volume X G l^’^j^xnjxL imposing nj < mj and nj < mj. In other 

words, the compressive acquisition of the HS volume can be mitigated by varying N = niUjL. 
Typically, if L = 16, taking nj = mi/2 and nj = mj/2 the number of observations reads 
M := mjmj = AN/L = A^/4. 

Mathematically, this means that there exists an operator Up G that, in matrix 

form, 

xx = Up xx (1) 

where Xx = vec(XA) G (resp. Xx = vec(XA) G M^) is the vectorized form of the spectral 

band Xx (resp. Xx) of X (resp. X). 

To understand the structure of this operator, let us consider it in a simplified situation, z.e., 
in the sampling model of a one dimensional “monochromatic image” f{t) G I/^(M) on a spatial 
domain parametrized by t. At a low resolution with spatial discretization period T > 0, the 
continuous light intensity is acquired as 

fin] = fZo = fi'riT), 

with n e Z the indices of each pixel. At a finer resolution with sampling period 0 < T' < T (e.^., 
T' = T/4), we would have samples 

9{k] = JZofmt-kr)dt = fikT), ( 2 ) 

with k G Z. Let us assume the existence of a generative (interpolative) model of / 

fit) ~ fit) = E“=-oo f{n]<fnit). (3) 

To ensure that the model error / — / is minimal, the signal / has to be sufficiently smooth: a 
condition that can be enforced by an optical filtering, as assumed in this paper. In particular, in 
the case of the Shannon-Whit taker formula with ipn{t) = sine ( ^~jf^ ), f = f if f has no frequencies 
higher than 1/2T (band-limited), otherwise aliasing occurs. Depending on the properties of /, 
other basis functions ipn, possibly with finite support, are also possible for minimizing the model 
error [25] . 

Since allows us to estimate a continuous signal from the samples f[n] at low resolution, 
assuming a vanishing model error and combining this equation with ([^ provides 

9[k] = T,n=-^f[n]u[k,n], (4) 

with u[k,n] = ipn{kT') defining the entries of our up-sampling operator Up in this simplified 
exposition. Rather than the HR and impractical filters of the Shannon-Whittaker generative 
model, we use here the 2-D FIR Lanczos filter [8]. 

We are now ready to express the general sensing model of our HS sensing. Since there is ideally 
no spectral dispersion in the device and each FPA cell is sensitive to exactly one wavelength, we 
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can separate the FPA pixels in subsets, according to their wavelength, and consider that each 
band of the hyperspectral volume X is acquired independently with a different subset of pixels. 
Let us denote by Mx G {0, the diagonal mask operator selecting the FPA pixels of the 

band A. By construction of the FPA, we have ^x = I MxM and MxMx' = 0 for A 7 ^ A'. 

Writing y = vec(l^) and x = (ccf, • • • and defining the general sensing matrix ^ := 

($ 1 , • • • , ^l) with •= ATaUp, our generalized inpainting problem corresponds now to recover 
the target HS volume x from the possibly noisy observations y in 

i/ = ^x + n, (5) 

with n an additive noise of bounded energy. 

3 Signal Prior and Reconstruction 

Two main families of CS reconstruction methods exist for estimating x from (§. Convex relax¬ 
ations of the associated sparsely-regularized (therefore combinatorial) inverse problem, such as the 
Basis Pursuit DeNoising (BPDN) program, were among the first methods to be used in CS [9]. 
The second family contains greedy algorithms, e.^.. Iterative Hard Thresholding (IHT) [15], that 
aim at solving approximately the combinatorial problem. Originally, CS theory was only adapted 
to exploit sparsity of the signal in an orthonormal basis but alternatives have emerged since then. 
In compressive HSI, [161 [171 HH] used structured sparsity, total variation and low rank. In [19], 
the authors proposed to use an adapted dictionary containing the known spectra of each ma¬ 
terial potentially present in the scene. Experience from other data restoration problems (e.^., 
denoising, deconvolution) have shown that using redundant dietionaries instead of orthonormal 
bases often lead to dramatic improvement. On the website associated with the book [ 20 ], Gabriel 
Peyre proposed a modified version of IHT using the Pseudo-Inverse of a redundant analysis dic¬ 
tionary after the thresholding step. In this section, we present this algorithm called here PIHT 
for Pseudo-Inverse IHT. It is related to AIHT [21] and SSIHT [22], two extensions of IHT for 
signals respectively in the eosparse analysis [23] and in the sparse synthesis [24] models. How¬ 
ever, compared to these two algorithms, each iteration of PIHT is faster to compute as efficient 
algorithms exist for the computing the pseudo-inverse of several redundant transforms. This is 
explained below after the definition of our signal prior. 

Sparse analysis model: Since the inverse problem introduced in Sec. is underdetermined 
(M < N)^ prior information about X must be assumed to enforce the solution unicity. In this 
work, we adopt an analysis sparse model for the HS volume. The bands Xxoi X are first expected 
to lead to sparse coefficients when analyzed through an Undecimated Discrete Wavelets Transform 
(UDWT), z.e., a shift-invariant, redundant version of the Discrete Wavelet Transform [25] [27]. 
Second, the targeted individual pixel spectra are assumed smooth and we select therefore a Discrete 
Cosine Transform (DCT) basis for “sparsifying” these spectra. These two analysis priors are then 
combined by tensorial product, z.e., we form 

A = Audwt G Adct 

with (g) the Kronecker product, such that the transformed volume Ax G is sparse. The UDWT 
uses the biorthogonal Daubechies wavelets of length 8 on 3 levels (such that P = ION). Moreover, 
the UDWT scaling coefficients are further transformed with a 2-D DCT, as those low-frequency 
UDWT bands should be sparser after such a transform. This solution led to better results than 
avoiding this additional DCT transform or simply discarding these unsparse bands from the prior. 
Finally, all reconstruction methods described below are constrained to provide solutions within a 
realistic intensity range, z.e., x G 7^ = [0,Xniax]^ for some known Xmax > 0, or ^ with 

the orthogonal projector on IZ. 

Pseudo-Inverse IHT: We have selected in this work the Pseudo-Inverse IHT (PIHT) [20], z.e., 
a greedy algorithm that aims at approaching the following non-convex problem: 

x^ = argmin^^T^ ||^u — y\\ s.t. 1-Lk{Au) = An, ( 6 ) 
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Figure 2: Both reference HSI volumes (false RGB color). Left: Groundtruth for synthetic experiments {'"''Fresh Fruits”). 
Right: Naive demosaicing of data from real sensor {''^Leaves”). 


where the hard thresholding operator sets to 0 all but the K largest entries of u. 


Algorithm 1 Pseudo-Inverse Iterative Hard Thresholding 

1: input: 

2: CCq, Riter? •> ^ [^] 5 

3: initialization: 

4: := Xq] 

5: for 5 = 1 to S' do 
6: r" := ||#*(y - 

7: a* + 

8: a:*+i := Vn [A^Uk^ (Aa")] 

9: end for 


PIHT is described in Alg. In order to maximize efficiency and speed, PIHT is initialized 
with a point that is not “too far” from cc* in (§, le., Xq is obtained by a standard 3-D linear 
interpolation of the unknown samples of X from those sampled by Y and resizing the result to 
the dimensions of X. Inspired by |28[ l29] . the number of non-zero components K is progressively 
increased for the hard thresholding during the S iterations. The initial and final values are 
set following a heuristic^ depending on the statistical distribution of ap = Axq as a proxy for the 
one of the unknown Ax. This distribution being heavy tailed, we took and as the number of 
coefficients bigger than the “mean plus a” and the “mean plus 2.5cr” of the distribution of log |ao|, 
respectively. Notice that the value in Alg. is the r minimizing \\y — -hr^* {y — ^x^) ) ||. 

Let us conclude this section by mentioning that, compared to AIHT [21] and SSIHT [52], 
PIHT is faster to compute at each iteration as fast algorithms exist to estimate the vector-matrix 
product involving the pseudo-inverse A^ at line in Alg.[^ Those methods rely on the knowledge 
of the dual transform of our UDWT system [25|. However, convergence guarantees for PIHT are 
still unknown. We can just state that, if cc G 7^ is such that y = ^x (noiseless sensing), then, x 
is a fixed point of the PIHT iterations if and only if Ax is A-sparse. For space reason, we omit 
here the proof of this fact. 

4 Numerical Results 

This section demonstrates the capability of the proposed method. In the first part, we use a 
simulated model of the proposed architecture. The synthetic measurements are produced from an 
existing real high resolution HSI dataset (Fig. left [26]). This allows a quantitative comparison 
of the quality in several experimental conditions reproduced in a controlled numerical environ¬ 
ment. The second part uses compressed data acquired with the existing mosaic version of the 
sensor described below and in i (Fig-i right). For this imager, a naive demosaicing method is 
implemented in [6]. Notice that in all experiments, the number of iterations for PIHT was set to 


5 
























Figure 3: From top to bottom: Groundtruth; Naive demosaic 19.1 dB ; PIHT (mosaic) 27.03 dB; 3DLinterp (random) 
23.5 dB; PIHT (random) 30.8 dB. Displayed bands are 530nm (green, A = 4), 733nm (deep red, A = 10) and 935nm (infra 
red, A = 16). 


S = 200. 

Synthetic Data and Methods Comparison: In this setup, the simulated sensor has a resolu¬ 
tion M of 512x512 pixels. The pixel grid is partitioned in L = 16 different subsets corresponding 
to 16 FPI at wavelengths ranging between visible and near infra-red light. This paper proposes 
to test two different pixel subsets organizations: a mosaic of 128x 128 identical 4x4 macro-pixels 
as for the existing sensor [6], or a fully scrambled arrangement simply obtained by permuting 
randomly the pixel indices over the whole FPA, hence preserving the conditions on masks {Mx} 
(Sec. [^. In Table we compare, for the random case, the 3-D linear interpolation (SDLinterp, 
also used as initialization of other methods), with a convex reconstruction method (ABPDN j3Q] 
solved with the algorithm from (STJ |32] |33] ) and PIHT. Three target resolutions are tested: the 
Nyquist rate 128x128x16; an intermediate case (available thanks to our generalized inpainting 
formulation) 256x256x16; and a fully compressive (plain inpainting) 512x512x16 setup. The 
convex method was stopped after 2000 iterations, possibly before full convergence. This probably 
explains its low reconstruction SNR levels. Speed was the main motivation for using a greedy 
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N = . 

M 

N = AM 

N = 16M 


SNR (dB) 

time 

SNR (dB) 

time 

SNR (dB) 

time 

3DLinterp 

28.3 

34" 

25.3 

33" 

23.5 

33" 

ABPDN 

30.4 

59'29" 

17.4 

2h59' 

15.6 

12h52' 

PIHT 

32.9 

4'26" 

33.6 

15'6" 

30.9 

55'3" 


Table 1: Comparison of the initialization xq (SDLinterp) a convex method (ABPDN) and PIHT in terms of SNR 
(dB) and running time for different target sizes, given an acquisition size, on a synthetic example (generated from 
the “Fresh Fruit” data set). 



Figure 4: From top to bottom: Naive demosaic; SDLinterp; PIHT. Displayed bands are 506nm (bluish green, A = 4), 
567nm (green-yellow, A = 10) and 628nm (red, A = 16). 


method in this work but PIHT SNRs also appeared to clearly outperform those of APBDN in 
this setup. Note that when the super-resolution factor N/M increases, while the reconstruction 
SNR degrades rapidly for SDLinterp and ABPDN, it remains almost constant for PIHT. However, 
horizontal comparison in the table has to be interpreted with care since the SNR may depend on 

N. 

On Fig. 1^ we fix A" = 16M and compare the existing mosaic sensor (rows 2 and 3) with the 
random layout (4 and 5). We also compare PIHT (3 and 5) with a naive demosaicing (nearest 
neighbor interpolation on each band individually, row 2) and SDLinterp method (row 4). Observe 
first that PIHT gives almost perfect visual quality even if the super-resolution factor N/M is as 
high as 16. The advantage of the random pattern over the mosaic is best seen by looking at band 
16 and at the spectrum (c) from the edge of the wicker basket. Both naive and SDLinterp methods 
clearly show bad performances compared to PIHT. Finally, for the sake of completeness, we have 
added noise to the simulated compressive data and studied the average reconstruction SNR that 
was obtained with PIHT in the case N = AM. For an input SNR of 10 dB, 20 dB and 30 dB, we 
observed a reconstruction SNR of 18.9 dB, 27.0 dB and 32.3 dB, respectively, demonstrating the 
robustness of the method. 

Real Data Reconstruction: Here, the real sensor [6] was used. This imager has a resolution 
M of 1024X 1024 pixels organized in a mosaic of 256x256 identical 4x4 macro-pixels; each with 
L = 16 different FPI at wavelengths of visible light (as in Fig.[^. This time, the results on Fig.[^ 
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can only be compared qualitatively since no groundtruth is available. The super-resolution factor 
N/M here is 4, be., we recover a volume with 512 x 512 x 16 voxels. The naive method that was 
used prior to our work (top row) is clearly the worst. Notice that the performances of SDLinterp 
seems much better than in Fig.|^ However, in the electronic version of this article, you can zoom 
on the middle row of Fig.j^and see a “grid artifact” that is almost completely removed with PIHT 
(bottom row). 

5 Conclusion 

In this paper, a novel hyperspectral imaging technique based on the snapshot imager from [6] 
was presented. We proposed a flexible generalized inpainting formulation inspired by compressed 
sensing and aiming at demosaicing and increasing the resolution of the acquired data volume. 
The formulation permits to choose arbitrarily the targeted resolution, allowing a tradeoff between 
quality and size. Randomly scrambling the distribution layout of the Fabry-Perot interferometers 
helped to stabilize further the reconstruction method. We regularized the inpainting problem with 
a redundant dictionary analysis sparsity prior and solved it using PIHT, a fast greedy iterative 
algorithm. We finally demonstrated the performances of our method, both qualitatively and quan¬ 
titatively, through numerical simulations on synthetic and real data. As in [6], our method allows 
fast snapshot acquisition but with respect to the original imager, it dramatically increases the 
resulting resolution. The main shortcoming is that spectral resolution is limited to a few wave¬ 
lengths. While it can suffice in many applications, others may require a finer spectral resolution. 
This is also a big difference with respect to other research in compressive HSI [10]. Future works 
will include filter layout optimization or integration of true spectral filter responses for increased 
spectral resolution. Combining PIHT reconstruction with coded aperture compressive FP filtered 
HSI architectures m is also envisioned. 
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